The need to have a large amount of parallel data is a large hurdle for the practical use of voice conversion (VC). This paper presents a novel framework of exemplar-based VC that only requires a small number of parallel exemplars. In our previous work, a VC technique using non-negative matrix factorization (NMF) for noisy environments was proposed. This method requires parallel exemplars (which consist of the source exemplars and target exemplars that have the same texts uttered by the source and target speakers) for dictionary construction. In the framework of conventional Gaussian mixture model (GMM)-based VC, some approaches that do not need parallel exemplars have been proposed. However, in the framework of exemplar-based VC for noisy environments, such a method has never been proposed. In this paper, an adaptation matrix in an NMF framework is introduced to adapt the source dictionary to the target dictionary. This adaptation matrix is estimated using only a small parallel speech corpus. We refer to this method as affine NMF, and the effectiveness of this method has been confirmed by comparing its effectiveness with that of a conventional NMF-based method and a GMM-based method in noisy environments.
Loading....